1,519 research outputs found

    Compressing and indexing aligned readsets

    Get PDF
    Compressed full-text indexes are one of the main success stories of bioinformatics data structures but even they struggle to handle some DNA readsets. This may seem surprising since, at least when dealing with short reads from the same individual, the readset will be highly repetitive and, thus, highly compressible. If we are not careful, however, this advantage can be more than offset by two disadvantages: first, since most base pairs are included in at least tens reads each, the uncompressed readset is likely to be at least an order of magnitude larger than the individual's uncompressed genome; second, these indexes usually pay some space overhead for each string they store, and the total overhead can be substantial when dealing with millions of reads. The most successful compressed full-text indexes for readsets so far are based on the Extended Burrows-Wheeler Transform (EBWT) and use a sorting heuristic to try to reduce the space overhead per read, but they still treat the reads as separate strings and thus may not take full advantage of the readset's structure. For example, if we have already assembled an individual's genome from the readset, then we can usually use it to compress the readset well: e.g., we store the gap-coded list of reads' starting positions; we store the list of their lengths, which is often highly compressible; and we store information about the sequencing errors, which are rare with short reads. There is nowhere, however, where we can plug an assembled genome into the EBWT. In this paper we show how to use one or more assembled or partially assembled genome as the basis for a compressed full-text index of its readset. Specifically, we build a labelled tree by taking the assembled genome as a trunk and grafting onto it the reads that align to it, at the starting positions of their alignments. Next, we compute the eXtended Burrows-Wheeler Transform (XBWT) of the resulting labelled tree and build a compressed full-text index on that. Although this index can occasionally return false positives, it is usually much more compact than the alternatives. Following the established practice for datasets with many repetitions, we compare different full-text indices by looking at the number of runs in the transformed strings. For a human Chr19 readset our preliminary experiments show that eliminating separators characters from the EBWT reduces the number of runs by 19%, from 220 million to 178 million, and using the XBWT reduces it by a further 15%, to 150 million

    A compact index for order-preserving pattern matching

    Get PDF
    Order-preserving pattern matching has been introduced recently, but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same relative order as the pattern elements. For this problem, we consider the offline version in which we build an index for the reference sequence so that subsequent searches can be completed very efficiently. We propose a space-efficient index that works well in practice despite its lack of good worst-case time bounds. Our solution is based on the new approach of decomposing the indexed sequence into an order component, containing ordering information, and a \u3b4 component, containing information on the absolute values. Experiments show that this approach is viable, is faster than the available alternatives, and is the first one offering simultaneously small space usage and fast retrieval

    High-Order IMEX-RK Finite Volume Methods for Multidimensional Hyperbolic Systems.

    Get PDF
    In this paper we present a high-order accurate cell-centered finite volume method for the semi-implicit discretization of multidimensional hyperbolic systems in conservative form on unstructured grids. This method is based on a special splitting of the physical flux function into a convective and a non-convective part. The convective contribution to the global flux is treated implicitly by mimicking the upwinding of a scalar linear flux function while the rest of the flux is discretized in an explicit way. The spatial accuracy is ensured by allowing non-oscillatory polynomial reconstruction procedures, while the time accuracy is attained by adopting a Runge-Kutta stepping scheme. The method can be naturally considered in the framework of the textbf{IM}plicit-textbf{EX}plicit (IMEX) schemes and the properties of the resulting operators are analysed using the properties of M-matrices

    Scheduling Jobs in Flowshops with the Introduction of Additional Machines in the Future

    Get PDF
    This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Elsevier and can be found at: http://www.journals.elsevier.com/expert-systems-with-applications/.The problem of scheduling jobs to minimize total weighted tardiness in flowshops,\ud with the possibility of evolving into hybrid flowshops in the future, is investigated in\ud this paper. As this research is guided by a real problem in industry, the flowshop\ud considered has considerable flexibility, which stimulated the development of an\ud innovative methodology for this research. Each stage of the flowshop currently has\ud one or several identical machines. However, the manufacturing company is planning\ud to introduce additional machines with different capabilities in different stages in the\ud near future. Thus, the algorithm proposed and developed for the problem is not only\ud capable of solving the current flow line configuration but also the potential new\ud configurations that may result in the future. A meta-heuristic search algorithm based\ud on Tabu search is developed to solve this NP-hard, industry-guided problem. Six\ud different initial solution finding mechanisms are proposed. A carefully planned\ud nested split-plot design is performed to test the significance of different factors and\ud their impact on the performance of the different algorithms. To the best of our\ud knowledge, this research is the first of its kind that attempts to solve an industry-guided\ud problem with the concern for future developments

    Comet Machholz (C/2004 Q2): morphological structures in the inner coma and rotation parameters

    Full text link
    Extensive observations of comet C/2004 Q2 (Machholz) were carried out between August 2004 and May 2005. The images obtained were used to investigate the comet's inner coma features at resolutions between 350 and 1500 km/pixel. A photometric analysis of the dust outflowing from the comet's nucleus and the study of the motion of the morphological structures in the inner coma indicated that the rotation period of the nucleus was most likely around 0.74 days. A thorough investigation of the inner coma morphology allowed us to observe two main active sources on the comet's nucleus, at a latitude of +85{\deg} \pm 5{\deg} and +45{\deg} \pm 5{\deg}, respectively. Further sources have been observed, but their activity ran out quite rapidly over time; the most relevant was at latcom. = 25{\deg} \pm 5{\deg}. Graphic simulations of the geometrical conditions of observation of the inner coma were compared with the images and used to determine a pole orientation at RA=95{\deg} \pm 5{\deg}, Dec=+35{\deg} \pm 5{\deg}. The comet's spin axis was lying nearly on the plane of the sky during the first decade of December 2004.Comment: 29 pages, 8 figures, 3 table

    A new class of string transformations for compressed text indexing

    Get PDF
    Introduced about thirty years ago in the field of data compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the design of efficient self-indexing compressed data structures. Finding other string transformations with the same remarkable properties of BWT has been a challenge for many researchers for a long time. In this paper, we introduce a whole class of new string transformations, called local orderings-based transformations, which have all the “myriad virtues” of BWT. As a further result, we show that such new string transformations can be used for the construction of the recently introduced r-index, which makes them suitable also for highly repetitive collections. In this context, we consider the problem of finding, for a given string, the BWT variant that minimizes the number of runs in the transformed string

    The Alternating BWT: An algorithmic perspective

    Get PDF
    The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm (K\ue4rkk\ue4inen et al., 2006 [28]) with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order
    • …
    corecore